Overview:
- Inserting a piece of data or loads of data into a database server is a common operation. For example, when a user signs up for a web service in Internet, the details of the user are stored into some database server.
- In MongoDB, a JSON document corresponds to a row of data or a data record.
- These records are stored in binary JSON format into the MongoDB collections. MongoDB collections are like what tables are to RDBMSes.
- Since JSON is a schema + data format, technically it is possible to insert documents with different schema into a same MongoDB collection.
- However, it is always better to place the documents with different structures in designated collections.
Inserting documents into a MongoDB collection:
- Import the pymongo module.
- Create an instance of pymongo.MongoClient.
- MongoClient takes the port number and the host of the MongoDB server as parameters.
- From the MongoClient instance obtain a database instance.
- The collections of a MongoDB database, are available as attributes of the database instance.
- Using the object.attribute notation access the collection and call insert() method on the collection instance.
- insert() takes a JSON document as a string and inserts into the specified MongoDB collection.
Example:
# import MongoClient from pymongo from pymongo import MongoClient
# Get a MongoClient object connectionObject = MongoClient('mongodb://localhost:27017/')
#Access the database using object.attribute notation databaseObject = connectionObject.sample
#Access the mongodb collection using object.attribute notation collectionObject = databaseObject.test
# insert a simple json document into the test collection collectionObject.insert({"red":123, "green":223, "blue":23}) collectionObject.insert({"red":146, "green":46, "blue":246})
# Using find() query all the documents from the collection for document in collectionObject.find(): # print each document print(document) |
Output:
{'_id': ObjectId('5ab9fb8302334a031cc2fc13'), 'red': 123, 'green': 223, 'blue': 23} {'_id': ObjectId('5ab9fb8302334a031cc2fc14'), 'red': 146, 'green': 46, 'blue': 246} |
Inserting multiple documents into MongoDB concurrently using MongoClient's socket pool:
- MongoClient is an abstraction of "1 to n" number of database connections to a MongoDB server from a process.
- A MongoClient has a connection pool with a default size of 100.
- Within a process one or more threads can make use of the pool of socket connections to the database utilizing up to maxpoolsize connections in accordance with the parameters waitQueueMultiple and waitQueueTimeoutMS parameters.
- When the maxpoolsize has reached and no more requests can wait in queue MongoClient starts raising exceptions.
- The Python example below tries to insert 400 documents concurrently using 400 threads using the MongoClient and its connection pool.
Example:
from pymongo import MongoClient from threading import Thread
THREAD_COUNT = 400
# Derive from Threading.thread to create a specialised insert thread class DataInsertThread(Thread):
database = None threadNumber = None
def __init__(self, database_in, threadNumber): self.database = database_in self.threadNumber = threadNumber Thread.__init__(self)
def run(self): self.database.test.insert({"Data inserted by thread":self.threadNumber})
# Get a MongoClient instance mongoClient = MongoClient("mongodb://localhost:27017/", maxPoolSize=200, # connection pool size is 200 waitQueueTimeoutMS=200, # how long a thread can wait for a connection waitQueueMultiple=500 # when the pool is fully used 200 threads can wait ) # Get the database object databaseObject = mongoClient.sample
insertThreads = []
# Create insert threads for threadNum in range(THREAD_COUNT): insertThread = DataInsertThread(databaseObject, threadNum) insertThreads.append(insertThread)
# Start the insert thread insertThread.start()
# Wait till all the insert threads are complete for insertThread in insertThreads: insertThread.join()
|
Output:
To see the record count after executing the above example use the MongoDB shell and issue the command count().
> db.test.count() 400 |